The message is an indian charity aims to identify the least and most indian cities in population and compare its male vs female graduates, literates and children.Which leads to prioritize the cities needs to develop the teaching process as building schools and providing it with all its needs and technologies.The organization then analyze the results of the visualization comparisons shown below which helps in decision making process to identify which city needs to be developed at first
we will use the dataset which type is table
attributes : •'name_of_city’: Name of the City (Categorical attribute and it takes string values,493 Levels)
• 'state_code’: State Code of the City (numeric attribute and it takes integer values, range from 1 to 35)
• 'state_name’: State Name of the City (Categorical attribute and it takes string values, 29 Levels)
• 'dist_code’: District Code where the city belongs (numeric attribute and it takes integer values ) (numeric, range from 1 to 99)
• 'population_total’: Total Population (numeric attribute and it takes integer values, range from 100036 to 12478447)
• 'population_male’: Male Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 50201 to 6736815)
• 'population_female’: Female Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 45126 to 5741632)
• '0-6_population_total’: 0-6 Age Total Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 6547 to 1209275)
• '0-6_population_male': 0-6 Age Male Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 3406 to 647938)
• '0-6_population_female': 0-6 Age Female Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 3107 to 561337)
• 'literates_total’: Total Literates (numeric, attribute and it takes integer values attribute and it takes integer values range from 56998 to 10237586)
• 'literates_male’: Male Literates (numeric, attribute and it takes integer values attribute and it takes integer values range from 34751 to 5727774)
• 'literates_female’: Female Literates (numeric attribute and it takes integer values attribute and it takes integer values, range from 22247 to 4509812)
• 'sex_ratio’: Sex Ratio (numeric, attribute and it takes integer values range from 700 to 1093)
• 'child_sex_ratio’: Sex ratio in 0-6 (numeric attribute and it takes integer values, range from 762 to 1185)
• 'effective_literacy_rate_total’: Literacy rate over Age 7 (numeric attribute and it takes float values, range from 49.51 to 98.8)
• 'effective_literacy_rate_male’: Male Literacy rate over Age 7 (numeric attribute and it takes float values, range from 52.27 to 99.3)
• 'effective_literacy_rate_female': Female Literacy rate over Age 7 (numeric attribute and it takes float values, range from 46.45 to 98.31)
• 'location’: Lat, Lng, the Location (Categorical attribute and it takes string values, 490 Levels)
• 'total_graduates’: Total Number of Graduates (numeric and it takes integer values, range from 2532 to 2221137)
• 'male_graduates’: Male Graduates (numeric and it takes integer values, range from 1703 to 1210040)
• 'female_graduates’: Female Graduates (numeric and it takes integer values, range from 829 to 1011097)
items : each row represent item
1)Distribution plots of literacy rates (Total, male & female)? salma
2)Cities with highest sex ratio (top 20) on map?
3)relations between some columns?
4)how to find repeated values in each column?
5)Comparison between number of cities in each state(static)? ayya
6)Is there any relation between sex ratio and Literacy rates? using Scatterplot?
7)what is the percentage of each state_name in the first 30 row in the data ?
8)Cities with highest literacy rates (top 20) ?
9)Graduates’ distribution in top 20 cities total (all ordered by the greatest number of graduates)? Using barplot? rahma
10)Graduates’ distribution in top 20 cities for male & female (all ordered by the greatest number of graduates)? Using barplot
11)Graduates’ distribution in top 20 cities for total &male & female (all ordered by the greatest number of graduates)? Using barplot
12)Top 10 states with highest population? using barplot
13)Is there any relationship between columns?
14)Male VS Female Graduates in each city (static) ahmed
15)Comparison between Total population,Total population younger than 6,Total Graduates and Total literates in each city (static)
16)Cities with highest sex ratio (top 20)
17)
we will talk about what plots we use and why below
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
import plotly.graph_objs as go
import plotly
import plotly.express as px
import plotly.figure_factory as ff
data has 493 rows and 22 columns about Top 500 Indian Cities
The main features of my goal in the data are to find all information about Top 500 Indian Cities
name_of_city ,state_code ,state_name ,dist_code,population_total ,population_male ,population_female ,0-6_population_total,0-6_population_male,0-6_population_female ,literates_total ,literates_male ,literates_female ,sex_ratio ,child_sex_ratio ,effective_literacy_rate_total ,effective_literacy_rate_male ,effective_literacy_rate_female ,location ,total_graduates ,male_graduates ,female_graduates
load dataset
# read data
df = pd.read_csv("cities_r2.csv")
df.head(5)
| name_of_city | state_code | state_name | dist_code | population_total | population_male | population_female | 0-6_population_total | 0-6_population_male | 0-6_population_female | ... | literates_female | sex_ratio | child_sex_ratio | effective_literacy_rate_total | effective_literacy_rate_male | effective_literacy_rate_female | location | total_graduates | male_graduates | female_graduates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Abohar | 3 | PUNJAB | 9 | 145238 | 76840 | 68398 | 15870 | 8587 | 7283 | ... | 44972 | 890 | 848 | 79.86 | 85.49 | 73.59 | 30.1452928,74.1993043 | 16287 | 8612 | 7675 |
| 1 | Achalpur | 27 | MAHARASHTRA | 7 | 112293 | 58256 | 54037 | 11810 | 6186 | 5624 | ... | 43086 | 928 | 909 | 91.99 | 94.77 | 89.00 | 21.257584,77.5086754 | 8863 | 5269 | 3594 |
| 2 | Adilabad | 28 | ANDHRA PRADESH | 1 | 117388 | 59232 | 58156 | 13103 | 6731 | 6372 | ... | 37660 | 982 | 947 | 80.51 | 88.18 | 72.73 | 19.0809075,79.560344 | 10565 | 6797 | 3768 |
| 3 | Adityapur | 20 | JHARKHAND | 24 | 173988 | 91495 | 82493 | 23042 | 12063 | 10979 | ... | 54515 | 902 | 910 | 83.46 | 89.98 | 76.23 | 22.7834741,86.1576889 | 19225 | 12189 | 7036 |
| 4 | Adoni | 28 | ANDHRA PRADESH | 21 | 166537 | 82743 | 83794 | 18406 | 9355 | 9051 | ... | 45089 | 1013 | 968 | 68.38 | 76.58 | 60.33 | 15.6322227,77.2728368 | 11902 | 7871 | 4031 |
5 rows × 22 columns
# statistical description of data
df.describe()
| state_code | dist_code | population_total | population_male | population_female | 0-6_population_total | 0-6_population_male | 0-6_population_female | literates_total | literates_male | literates_female | sex_ratio | child_sex_ratio | effective_literacy_rate_total | effective_literacy_rate_male | effective_literacy_rate_female | total_graduates | male_graduates | female_graduates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 493.000000 | 493.000000 | 4.930000e+02 | 4.930000e+02 | 4.930000e+02 | 4.930000e+02 | 493.000000 | 493.000000 | 4.930000e+02 | 4.930000e+02 | 4.930000e+02 | 493.000000 | 493.000000 | 493.000000 | 493.000000 | 493.000000 | 4.930000e+02 | 4.930000e+02 | 4.930000e+02 |
| mean | 18.643002 | 16.782961 | 4.481124e+05 | 2.343468e+05 | 2.137656e+05 | 4.709285e+04 | 24849.527383 | 22243.320487 | 3.461527e+05 | 1.894384e+05 | 1.567143e+05 | 930.294118 | 902.332657 | 85.131460 | 89.920162 | 79.967181 | 6.620236e+04 | 3.771556e+04 | 2.848680e+04 |
| std | 9.297168 | 15.566131 | 1.033228e+06 | 5.487786e+05 | 4.848622e+05 | 1.050279e+05 | 55535.310272 | 49523.241379 | 8.220952e+05 | 4.534753e+05 | 3.690677e+05 | 55.849106 | 49.794689 | 6.186345 | 5.377492 | 7.577825 | 1.778187e+05 | 9.849574e+04 | 7.951556e+04 |
| min | 1.000000 | 1.000000 | 1.000360e+05 | 5.020100e+04 | 4.512600e+04 | 6.547000e+03 | 3406.000000 | 3107.000000 | 5.699800e+04 | 3.475100e+04 | 2.224700e+04 | 700.000000 | 762.000000 | 49.510000 | 52.270000 | 46.450000 | 2.532000e+03 | 1.703000e+03 | 8.290000e+02 |
| 25% | 9.000000 | 7.000000 | 1.261420e+05 | 6.638400e+04 | 6.041100e+04 | 1.363900e+04 | 7221.000000 | 6457.000000 | 9.768700e+04 | 5.357800e+04 | 4.391400e+04 | 890.000000 | 868.000000 | 81.750000 | 87.280000 | 75.800000 | 1.527700e+04 | 9.289000e+03 | 6.114000e+03 |
| 50% | 19.000000 | 13.000000 | 1.841330e+05 | 9.665500e+04 | 8.776800e+04 | 1.944000e+04 | 10342.000000 | 9172.000000 | 1.413290e+05 | 7.590600e+04 | 6.383600e+04 | 922.000000 | 903.000000 | 85.970000 | 91.180000 | 80.920000 | 2.395900e+04 | 1.404900e+04 | 9.558000e+03 |
| 75% | 27.000000 | 21.000000 | 3.490330e+05 | 1.750550e+05 | 1.700260e+05 | 3.794500e+04 | 19982.000000 | 17954.000000 | 2.679000e+05 | 1.455480e+05 | 1.235030e+05 | 971.000000 | 942.000000 | 89.330000 | 93.400000 | 85.400000 | 5.036700e+04 | 2.787200e+04 | 2.086600e+04 |
| max | 35.000000 | 99.000000 | 1.247845e+07 | 6.736815e+06 | 5.741632e+06 | 1.209275e+06 | 647938.000000 | 561337.000000 | 1.023759e+07 | 5.727774e+06 | 4.509812e+06 | 1093.000000 | 1185.000000 | 98.800000 | 99.300000 | 98.310000 | 2.221137e+06 | 1.210040e+06 | 1.011097e+06 |
# information about data (name of coulmns , data types)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 493 entries, 0 to 492 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name_of_city 493 non-null object 1 state_code 493 non-null int64 2 state_name 493 non-null object 3 dist_code 493 non-null int64 4 population_total 493 non-null int64 5 population_male 493 non-null int64 6 population_female 493 non-null int64 7 0-6_population_total 493 non-null int64 8 0-6_population_male 493 non-null int64 9 0-6_population_female 493 non-null int64 10 literates_total 493 non-null int64 11 literates_male 493 non-null int64 12 literates_female 493 non-null int64 13 sex_ratio 493 non-null int64 14 child_sex_ratio 493 non-null int64 15 effective_literacy_rate_total 493 non-null float64 16 effective_literacy_rate_male 493 non-null float64 17 effective_literacy_rate_female 493 non-null float64 18 location 493 non-null object 19 total_graduates 493 non-null int64 20 male_graduates 493 non-null int64 21 female_graduates 493 non-null int64 dtypes: float64(3), int64(16), object(3) memory usage: 84.9+ KB
# check for duplicated
duplicate=df.duplicated()
if(duplicate.all==True):
print("duplicated")
else:
print("not duplicated")
not duplicated
# disply number of columns and rows
df.shape
(493, 22)
We want to split values in location column to latitude and longitude with 2 coulmns called(lat,lon) to use it in maps.
#split values in location to (lat,lone)
df[['lat','lon']] = df.location.str.split(",",expand=True,)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 493 entries, 0 to 492 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name_of_city 493 non-null object 1 state_code 493 non-null int64 2 state_name 493 non-null object 3 dist_code 493 non-null int64 4 population_total 493 non-null int64 5 population_male 493 non-null int64 6 population_female 493 non-null int64 7 0-6_population_total 493 non-null int64 8 0-6_population_male 493 non-null int64 9 0-6_population_female 493 non-null int64 10 literates_total 493 non-null int64 11 literates_male 493 non-null int64 12 literates_female 493 non-null int64 13 sex_ratio 493 non-null int64 14 child_sex_ratio 493 non-null int64 15 effective_literacy_rate_total 493 non-null float64 16 effective_literacy_rate_male 493 non-null float64 17 effective_literacy_rate_female 493 non-null float64 18 location 493 non-null object 19 total_graduates 493 non-null int64 20 male_graduates 493 non-null int64 21 female_graduates 493 non-null int64 22 lat 493 non-null object 23 lon 493 non-null object dtypes: float64(3), int64(16), object(5) memory usage: 92.6+ KB
Then we want to change the data type of lat and lon columns from object to numeric because we can't use object data type in maps
df['lat'] = pd.to_numeric(df['lat'])
df['lon'] = pd.to_numeric(df['lon'])
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 493 entries, 0 to 492 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name_of_city 493 non-null object 1 state_code 493 non-null int64 2 state_name 493 non-null object 3 dist_code 493 non-null int64 4 population_total 493 non-null int64 5 population_male 493 non-null int64 6 population_female 493 non-null int64 7 0-6_population_total 493 non-null int64 8 0-6_population_male 493 non-null int64 9 0-6_population_female 493 non-null int64 10 literates_total 493 non-null int64 11 literates_male 493 non-null int64 12 literates_female 493 non-null int64 13 sex_ratio 493 non-null int64 14 child_sex_ratio 493 non-null int64 15 effective_literacy_rate_total 493 non-null float64 16 effective_literacy_rate_male 493 non-null float64 17 effective_literacy_rate_female 493 non-null float64 18 location 493 non-null object 19 total_graduates 493 non-null int64 20 male_graduates 493 non-null int64 21 female_graduates 493 non-null int64 22 lat 493 non-null float64 23 lon 493 non-null float64 dtypes: float64(5), int64(16), object(3) memory usage: 92.6+ KB
we use barplot because the barplot is used to display the relationship between a numeric and a categorical variable
state UTTAR PRADESH has the hieghest number of cities
#size of the graph
plt.figure(figsize=(18,9));
df=df.sort_values('state_name');
#we will use countplot because it is used to Show the counts of observations in each
#categorical bin using bars
sns.countplot(data=df, x='state_name',color='red',order=df.state_name.value_counts().index);
#y axis label
plt.ylabel("counts",size=25);
plt.yticks(size=15);
#x axis label
plt.xlabel("states",size=25);
plt.xticks(rotation=90,fontsize=15);
#plot label
plt.title("number of cities in each state",size = 25);
# sort values of population_total and take top20
df=df.sort_values('population_total',ascending=False)
top20=df.head(20)
px.scatter(top20, x="population_male", y="population_female", color="name_of_city",size='population_total')
fig = px.scatter(
top20, x='literates_female', y='sex_ratio', opacity=0.65,
trendline='ols', trendline_color_override='darkblue'
)
fig.show()
There is a linear relationship between sex ratio and female literacy shown in Fig
fig = px.scatter(
top20, x='literates_male', y='sex_ratio', opacity=0.65,
trendline='ols', trendline_color_override='darkblue'
)
fig.show()
There is a linear relationship between sex ratio and male literacy shown in Fig
top20=df.head(20)
# plotting top20 population on map
fig = px.scatter_mapbox(top20, lat="lat", lon="lon", color="population_total", size="population_total",
color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10,
mapbox_style="carto-positron")
fig.show()
top20=df.head(20)
# plotting top20 effective_literacy_rate_total on map
fig = px.scatter_mapbox(top20, lat="lat", lon="lon", color="effective_literacy_rate_total", size="effective_literacy_rate_total",
color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10,
mapbox_style="carto-positron")
fig.show()
top20=df.head(20)
# plotting top20 sex_ratio on map
fig = px.scatter_mapbox(top20, lat="lat", lon="lon", color="sex_ratio", size="sex_ratio",
color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10,
mapbox_style="carto-positron")
fig.show()
we will use SPOM because we want to get different relations between variable ("population_total", "0-6_population_total", "effective_literacy_rate_total", "total_graduates") and set colors using a column of the dataframe called name_of_city
fig = px.scatter_matrix(df,
dimensions=["population_total", "0-6_population_total", "effective_literacy_rate_total", "total_graduates"],
color="name_of_city");
fig.show()
we get relation between variable ("population_total", "0-6_population_total", "effective_literacy_rate_total", "total_graduates")
in this problem we use histogram for each column because histogram is the most commonly used graph to show frequency distributions
top20.hist(figsize=(20,20));
finally we found number of frequant values in each column
# creating list to choose columns i will use
hist_data=[df.effective_literacy_rate_total,df.effective_literacy_rate_male,df.effective_literacy_rate_female]
# label to columns i used
group_labels=["effective_literacy_rate_total","effective_literacy_rate_male","effective_literacy_rate_female"]
colors = ['#333F44', '#37AA9C', '#94F3E4']
#i use distplot basically for univariant set of observations and visualizes it through a histogram
fig = ff.create_distplot( hist_data,group_labels, colors=colors)
# Add title
fig.update_layout(title_text='Literacy rate distribution of top 500 Indians cities')
fig.show()
we found that males have literacy rate is more than females
# creating list to choose columns i will use
group=['population_female','population_male']
# we use barplot because the barplot is used to display the relationship
#between a numeric and a categorical variable
fig = px.bar(top20, x="name_of_city",y=group, barmode="group")
fig.show()
# sorting total_graduates column
df=df.sort_values("total_graduates",ascending=False)
top20_graduates=df.head(20)
# we use barplot because the barplot is used to display the relationship
#between a numeric and a categorical variable
fig=px.bar(top20_graduates,x="name_of_city",y="total_graduates",barmode="group")
fig.show()
# sorting total_graduates column and take the first 20 value
df=df.sort_values("total_graduates",ascending=False)
top20=df.head(20)
# grouping male_graduates , female_graduates
group=['male_graduates','female_graduates']
# we use barplot because the barplot is used to display the relationship
#between a numeric and a categorical variable
fig = px.bar(top20, y=group, x="name_of_city", barmode="group")
fig.show()
# sorting total_graduates column and take the first 20 value
df=df.sort_values("total_graduates",ascending=False)
top20=df.head(20)
# grouping male_graduates,female_graduates,total_graduates
group=['male_graduates','female_graduates','total_graduates']
# we use barplot because the barplot is used to display the relationship
#between a numeric and a categorical variable
fig = px.bar(top20, y=group, x="name_of_city", barmode="group")
fig.show()
we use heatmap because it is a two-dimensional visual representation of data and we can see relation between columns better
plt.figure(figsize=(18, 9))
heatmap = sns.heatmap(df.corr(),annot=True)
heatmap.set_title('correlation between columns', fontdict={'fontsize':15});
there is a relation between columns
# sorting population_total and take the first 10 values
df=df.sort_values('population_total',ascending=False)
top10=df.head(10)
#we use barplot because the barplot is used to display the relationship
#between a numeric and a categorical variable
fig = px.bar(top10, x="state_name", y="population_total",
pattern_shape_sequence=[".", "x", "+"])
fig.show()
n=['effective_literacy_rate_total','effective_literacy_rate_male','effective_literacy_rate_female']
fig = px.scatter(df, x="sex_ratio", y=n)
fig.show()
fig = px.sunburst(top20, values='sex_ratio', names='effective_literacy_rate_female', title='Population of European continent')
fig.show()
we found top 10 states have population and MAHARASHIRA state has the largest population and RAJASTHAN state has the lowest population
#sorting the dataset by 'effective_literacy_rate_total' column and store the largest 20 values in Top20_total
data_total = df.sort_values(by=['effective_literacy_rate_total'])
Top20_total = data_total.tail(20)
# drawing line plot for name_of_city depend on effective_literacy_rate_total value
fig = px.line(Top20_total, x="name_of_city", y="effective_literacy_rate_total",
line_shape="spline", render_mode="svg")
fig.show()
data_total = df.sort_values(by=['sex_ratio'])
Top20_total = data_total.tail(20)
# drawing line plot for name_of_city depend on sex_ratio
fig = px.line(Top20_total, x="name_of_city", y="sex_ratio",
line_shape="spline", render_mode="svg")
fig.show()
MG=df['male_graduates'].head(20)
FG=df['female_graduates'].head(20)
cities=df['name_of_city'].head(20)
#defining the height and width of each bar
n=20
height=np.arange(n)
width = .4
#defining styles of each bar male with grey and femake with orange
plt.bar(height,MG, color = 'grey',width = width, edgecolor = 'black',label='Male Graduates')
plt.bar(height + width, FG, color = 'orange',width = width, edgecolor = 'black',label='Female Graduates')
#defining plot labels and titles
plt.xlabel("City",fontsize=20)
plt.ylabel("Male VS Female Graduates",fontsize=20)
plt.title("Male VS Female Graduates in each city",fontsize=20)
#defining the data represented in the x axis and setting the label font size
plt.xticks(height + width/2,cities)
plt.tick_params(axis='both',labelsize=12,rotation=90)
#figure size
plt.rcParams["figure.figsize"] = (25,10)
#First we define an empty list for storing the count of cities in each state respectively to the uniqueState list
count=[]
stateName=df['state_name'].tolist()
uniqueState=df['state_name'].drop_duplicates()
#Then we make a loop to fill up this list by counting each state occurences in state_name column
for x in (uniqueState):
count.append(stateName.count(x))
#defining the height and width of each bar and the ratio of the plot
n=len(count)
height = np.arange(n)
width = .8
#defining bar properties
plt.barh(height, count, color = 'b', edgecolor = 'black',label='No. of cities')
#defininf labels and title
plt.xlabel("count",fontsize=20)
plt.ylabel("state",fontsize=20)
plt.title("Number of cities in each state",fontsize=20)
#setting the y axis to be the state names
plt.yticks(height+width/2,uniqueState)
#defining figure size and labels font size
plt.rcParams["figure.figsize"] = (30,20)
plt.tick_params(axis='both',labelsize=20)
plt.show()
PT=df['population_total'].tail(20)
PT6=df['0-6_population_total'].tail(20)
GT=df['total_graduates'].tail(20)
LT=df['literates_total'].tail(20)
cities=df['name_of_city'].tail(20)
#defining properties of each bar and stacking them all above each other
plt.bar(cities, PT, color='red')
plt.bar(cities, PT6, bottom=PT, color='b')
plt.bar(cities, GT, bottom=PT+PT6, color='grey')
plt.bar(cities, LT, bottom=PT+PT6+GT, color='lavender')
#defining labels and title
plt.xlabel("city")
plt.ylabel("Totals")
plt.legend(["Population total", "Less than 6 population total", "Graduates total", "Literates total"])
plt.title("Comparison between population total and subtotal groups of each city")
Text(0.5, 1.0, 'Comparison between population total and subtotal groups of each city')
import plotly.express as px
fig = px.choropleth_mapbox(top20, color="total_graduates",
locations="dist_code", featureidkey="properties.dist_code",
center={"lat": 20.5937, "lon": 78.9629},
mapbox_style="carto-positron", zoom=9)
fig.show()
import plotly.express as px
df = px.data.gapminder()
fig = px.choropleth(top20, locations="dist_code", color="total_graduates", hover_name="state_name", animation_frame="total_graduates", range_color=[20,80])
fig.show()
from bokeh.plotting import figure, output_file, show
from bokeh.tile_providers import CARTODBPOSITRON, get_provider
output_file("tile.html")
tile_provider = get_provider(CARTODBPOSITRON)
# range bounds supplied in web mercator coordinates
p = figure(x_range=(-2000000, 6000000), y_range=(-1000000, 7000000),
x_axis_type="mercator", y_axis_type="mercator")
p.add_tile(tile_provider)
show(p)
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, GMapOptions
from bokeh.plotting import gmap
output_file("gmap.html")
map_options = GMapOptions(lat=30.2861, lng=-97.7394, map_type="roadmap", zoom=11)
# For GMaps to function, Google requires you obtain and enable an API key:
#
# https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:
p = gmap("GOOGLE_API_KEY", map_options, title="Austin")
source = ColumnDataSource(
)
p.circle(x="lon", y="lat", size=15, fill_color="blue", fill_alpha=0.8, source=source)
show(p)
ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name. This could either be due to a misspelling or typo, or due to an expected column being missing. : key "x" value "lon", key "y" value "lat" [renderer: GlyphRenderer(id='1199', ...)]
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter_geo(df, locations="iso_alpha", color="continent", hover_name="country", size="pop",
animation_frame="year", projection="natural earth")
fig.show()